The Deep Web: Surfacing Hidden Value

نویسنده

MICHAEL K. BERGMAN

چکیده

Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not "see" or retrieve content in the deep Web -those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Hidden Web Crawlers

A large amount of data on the WWW remains inaccessible to crawlers of Web search engines because it can only be exposed on demand as users fill out and submit forms. The Hidden web refers to the collection of Web data which can be accessed by the crawler only through an interaction with the Web-based search form and not simply by traversing hyperlinks. Research on Hidden Web has emerged almost ...

متن کامل

Google's Deep Web crawl

The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web, accessing Deep-Web content has been a long-standing challenge for the database community. This paper describes a system for surfacing Deep-Web content, i.e., pre-computing submissions for each HTML...

متن کامل

Deep Webpage Classification and Extraction (DWCE)

As the Deep web (or Hidden web) information is hidden behind the search query forms, this information can only be accessed by interacting with these forms. Therefore, development of automated system that interacts with the search forms and extracts the hidden web pages would be of great value to human users. To accomplish this task stated above, this paper proposes a novel method “Deep Webpage ...

متن کامل

A Bootstrapping Approach to classification of Deep web Query Interfaces

Classification of Deep web sources is a very important process for the data extraction process while accessing the deep web content since it deals with domain specific data only. The existing methods cannot effectively classify these web databases. Hence, to solve this problem, we propose a new framework that uses the bootstrapping approach for automatic and accurate classification of the query...

متن کامل

A Task-specific Approach for Crawling the Deep Web

There is a great amount of valuable information on the web that cannot be accessed by conventional crawler engines. This portion of the web is usually known as the Deep Web or the Hidden Web. Most probably, the information of highest value contained in the deep web, is that behind web forms. In this paper, we describe a prototype hidden-web crawler able to access such content. Our approach is b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

The Deep Web: Surfacing Hidden Value

نویسنده

چکیده

منابع مشابه

A Comparative Study of Hidden Web Crawlers

Google's Deep Web crawl

Deep Webpage Classification and Extraction (DWCE)

A Bootstrapping Approach to classification of Deep web Query Interfaces

A Task-specific Approach for Crawling the Deep Web

عنوان ژورنال:

اشتراک گذاری